PSYC 2020-A01 / PSYC 6022-A01 | 2025-08-29 | Lab 2
Learning objectives:
R: Projects, functions
Statistics: Central tendency
Moving from 0–100% to 10 points each
Top 10 lab assignments * 10 points each = 100%
Does not change anything about the weight!
On lab computer
COS-GPU-2023
RStudio’s way of helping organizing files, scripts, etc.
I strongly recommend this!!
○ File > New Project
○ If you don’t already have a folder associated with this class, “New Directory”
○ If you do, “Existing Directory”
All R Scripts under the same project share a working directory
getwd() tells us the location of our working directory
setwd("C:/Users/Desktop/R Example") sets the working directory
Or, here::here() lets us do relative directories (my favorite!)
○ Just use the command at the top of the file to see where your directory is
○ Do need to install the here package first
Then, when you need a file, you can reference it relatively
Mean: Sum of all values divided by the total number of values
Median: When sorted lowest to highest, the middle value
Mode: The value that appears most often
Given this dataset:
What is the mean? 2
What is the median? 2
What is the mode? 2
Given this dataset:
What is the mean? 1.5
What is the median? 1.5
What is the mode? No mode!
A function performs some operation on an input and produces some output
Saw this last week
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
What is the function? Input? Output?
We can calculate central tendencies in two ways:
Given this dataset, calculate the mean
By hand (computer)
mean(x) function
x = vector of data
Given this dataset, calculate the median
By hand (computer)
median(x) function
x = vector of data
Given this dataset, calculate the mode
With the mode() function
Doesn’t work :(
Have to create our own
We’ve seen some built-in R functions (e.g., mean(), median()), but we can also make our own
function_name <- function(argument) {
do some stuff
return(this stuff)
}
ⓘ Don’t actually need to call return(); R will automatically return the last expression
Then, you can call the function
function_name(specific_argument)
To keep the results, make sure to assign them to some variable
very_important_results <- function_name(specific_argument)
Let’s go back to finding the mode
Given this dataset, calculate the mode
[1] 2 3 12 4 4
How does this work?
Takes time to look at all these for a lot of variables, even with functions
The summary(object) function provides us a quick overview of this information
object = for our purposes, a dataframe
What all do we get?
Summary statistics are great, but don’t trust them alone!
What do you think a dataset with these descriptives would look like?
https://www.research.autodesk.com/publications/same-stats-different-graphs/
Don’t rush: graph your data!
What should graphs do?An example of (simulated) SAT scores
What do we see here?

Positive skew, right-tailed
The mass of the distribution is concentrated on the left of the figure
Negative skew, left-tailed
The mass of the distribution is concentrated on the right of the figure
R has some plotting features built in—we saw this last week
Better… (thanks, ChatGPT)

plot(iris$Sepal.Length, iris$Sepal.Width,
pch = 19, # solid circles
col = "#377EB8", # pleasant blue
cex = 1.3, # slightly larger points
xlab = "Sepal Length", # cleaner label
ylab = "Sepal Width",
main = "Sepal Length vs Sepal Width",
cex.lab = 1.2, # bigger axis labels
cex.main = 1.4, # bigger title
font.main = 2) # bold titleWe will learn a few plots in base R plotting, and then we will learn a better way of making plots: ggplot2
hist() function
x = vector (variable) you want to plot (remember the $ function!)
breaks = bin count
main = title
xlab = label for x-axis
ylab = label for y-axis
col = color for bars
xlim = range for x-axis
ylim = range for y-axis
prob = T/F, proportion instead of frequency
An important decision for histograms is this number (or width) of bins
Specified with the breaks argument


An important decision for histograms is this number (or width) of bins
Specified with the breaks argument


If we could make the bins infinitesimally small, we could get a probability density function (PDF)

dat <- data.frame(x = rnorm(1000))
cowplot::plot_grid(
plotlist = list(
ggplot(dat, aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 10) +
coord_cartesian(xlim = c(-4, 4), ylim = c(0, .5)) +
labs(y = "Proportion") +
cowplot::theme_cowplot(),
ggplot(dat, aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 50) +
coord_cartesian(xlim = c(-4, 4), ylim = c(0, .5)) +
labs(y = "Proportion") +
cowplot::theme_cowplot(),
ggplot(dat, aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 100) +
coord_cartesian(xlim = c(-4, 4), ylim = c(0, .5)) +
labs(y = "Proportion") +
cowplot::theme_cowplot(),
ggplot(dat, aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 100, alpha = .8) +
stat_function(inherit.aes = FALSE, fun = dnorm,
n = 101, args = list(mean = 0, sd = 1),
xlim = c(-4, 4),
color = "grey35", linewidth = 2) +
coord_cartesian(xlim = c(-4, 4), ylim = c(0, .5)) +
labs(y = "Proportion") +
cowplot::theme_cowplot()
),
nrow = 1)https://jhelmer3.github.io/PSYC2020L/